What is AutoGen?
Microsoft Research's framework for building multi-agent AI systems that can reason, collaborate, and execute code.
AutoGen is an open-source framework from Microsoft Research that lets you build systems where multiple AI agents work together to solve complex tasks. Think of it as a runtime for AI teamwork β agents can talk to each other, use tools, write and execute code, and ask humans for input.
Unlike workflow tools (n8n, Zapier) that execute deterministic steps, AutoGen agents reason autonomously. They decide how to solve a problem, not just follow a pre-written path.
Why does AutoGen exist?
The Big Picture
AssistantAgent, UserProxyAgent, and GroupChat. This course uses v0.4 patterns.AutoGen vs. The World
| Framework | Paradigm | Best For |
|---|---|---|
| AutoGen | Autonomous multi-agent conversation | Complex reasoning, code generation, research tasks |
| LangGraph | Stateful graph-based workflows | Fine-grained control over agent state & branching |
| CrewAI | Role-based agent teams | Business automation with defined roles |
| n8n | Deterministic workflow automation | Integrating SaaS tools with predictable logic |
Core Concepts
The building blocks: agents, conversations, termination, and the LLM config pattern.
The Two Primary Agents
Every AutoGen system is built from two fundamental agent types. Understanding these is the most important thing in this entire course.
LLM Configuration
All agents that use an LLM require a config dict. This is where you define the model and API key.
import autogen # Define your LLM config llm_config = { "config_list": [ { "model": "gpt-4o", "api_key": "sk-...", # or use env var } ], "temperature": 0.1, # low = more deterministic "cache_seed": 42, # reproducible runs (optional) } # Or load from a JSON file (recommended for production) config_list = autogen.config_list_from_json( "OAI_CONFIG_LIST" )
Conversations & Termination
A conversation begins when one agent initiates a chat with another. Agents take turns. This continues until a termination condition is met.
# Termination methods: # 1. Max turns user_proxy.initiate_chat(assistant, max_turns=5) # 2. Keyword in reply ("TERMINATE") assistant = autogen.AssistantAgent( system_message="""...(your instructions)... When done, reply with: TERMINATE""" ) # 3. Custom function def my_termination(msg): return "task_complete" in msg["content"].lower() user_proxy = autogen.UserProxyAgent( is_termination_msg=my_termination )
Human Input Modes
| Mode | Behavior | Use Case |
|---|---|---|
| ALWAYS | Asks a human to reply at every step | Interactive sessions, demos |
| TERMINATE | Asks human only when termination is triggered | Approval gate at the end |
| NEVER | Fully autonomous β no human prompting | Production pipelines |
code_execution_config with a Docker sandbox or a restricted local path in production. Never run untrusted agent code on bare metal.Your First Agents
Build a working two-agent system from 15 lines of Python.
Installation
# Install AutoGen (v0.4+) pip install pyautogen # For code execution in Docker (recommended) pip install pyautogen[docker] # Set your API key export OPENAI_API_KEY="sk-..."
Hello World: Two-Agent System
import autogen # 1. LLM config llm_config = { "config_list": [{"model": "gpt-4o", "api_key": "sk-..."}] } # 2. The AI assistant agent assistant = autogen.AssistantAgent( name="assistant", llm_config=llm_config, system_message="""You are a helpful Python expert. When you finish the task, reply with: TERMINATE""" ) # 3. The user proxy (executes code, no human input) user_proxy = autogen.UserProxyAgent( name="user_proxy", human_input_mode="NEVER", is_termination_msg=lambda x: "TERMINATE" in x.get("content", ""), code_execution_config={ "work_dir": "coding", "use_docker": False, # set True in production } ) # 4. Start the conversation! user_proxy.initiate_chat( assistant, message="Write a Python script that prints the first 10 Fibonacci numbers." )
What Happens Step by Step
Conversation Patterns
Two-agent, sequential chaining, nested chats, and when to use each.
Pattern 1: Two-Agent (Default)
You've seen this. One user_proxy, one assistant. Best for focused single tasks: code generation, Q&A, analysis.
Pattern 2: Sequential Chaining
Run multiple two-agent conversations where the output of one feeds into the next. Useful for pipelines (write β review β deploy).
# Step 1: Writer agent creates a blog post result1 = user_proxy.initiate_chat( writer, message="Write a blog post about RAG systems" ) draft = result1.summary # Step 2: Critic reviews it result2 = user_proxy.initiate_chat( critic, message=f"Review this blog post:\n\n{draft}" ) # Step 3: Editor applies improvements result3 = user_proxy.initiate_chat( editor, message=f"Apply this feedback:\n{result2.summary}" )
Pattern 3: Nested Chats
An agent can spawn a sub-conversation mid-conversation. This is powerful for tasks that require a specialist to handle a sub-task before the main flow continues.
# Register a nested chat β when assistant gets a coding task, # it spawns a full sub-conversation with a coding specialist assistant.register_nested_chats( trigger=user_proxy, chat_queue=[ { "recipient": coding_specialist, "message": "Please implement this: ", "summary_method": "last_msg", "max_turns": 3, } ] )
Pattern 4: Swarm (v0.4)
AutoGen 0.4 introduced a Swarm pattern β agents can hand off to each other dynamically based on context, like a call-routing system.
from autogen import SwarmAgent, initiate_swarm_chat # Each agent defines who it can hand off to triage = SwarmAgent( name="triage", handoffs=["billing_agent", "tech_support", "sales"] ) initiate_swarm_chat( initial_agent=triage, agents=[triage, billing, tech_support, sales], messages="I can't access my account after payment failed" )
Tool Use & Function Calling
Give agents real-world capabilities: web search, database queries, API calls.
By default, agents can only reason and write code. Tools give them real-world capabilities. AutoGen wraps OpenAI function calling β agents decide when and how to call tools.
Defining Tools with Decorators
import autogen from autogen import AssistantAgent, UserProxyAgent llm_config = {"config_list": [{"model": "gpt-4o", "api_key": "..."}]} assistant = AssistantAgent(name="assistant", llm_config=llm_config) user_proxy = UserProxyAgent( name="user_proxy", human_input_mode="NEVER", code_execution_config=False # disable code exec, use tools instead ) # β¨ Register a tool β function is callable by user_proxy, # description is shown to the LLM to know when to call it @user_proxy.register_for_execution() @assistant.register_for_llm(description="Get current weather for a city") def get_weather(city: str) -> str: # Call a real weather API here return f"Weather in {city}: 22Β°C, sunny" @user_proxy.register_for_execution() @assistant.register_for_llm(description="Search the web for current info") def web_search(query: str) -> str: # Integrate with Serper, Tavily, or Bing here return f"Results for '{query}': ..." user_proxy.initiate_chat( assistant, message="What's the weather in Tokyo and is there a tech conference there this week?" )
How Tool Calling Works
Tool Best Practices
Type hints matter. AutoGen uses Python type hints to generate the JSON schema for the LLM. Always annotate your function arguments and return type.
Descriptions are prompts. The description string is what the LLM reads to decide when to call your tool. Write it like a clear instruction, not a comment.
Return strings. Tool functions should return strings (or JSON-serializable types that AutoGen will stringify). The LLM reads this as text.
register_for_llm and the user_proxy needs register_for_execution?Group Chat
Orchestrate 3+ specialized agents collaborating on a shared task.
Group chat is where AutoGen really shines for complex tasks. You assemble a team of specialized agents (researcher, coder, critic, planner) and a GroupChatManager decides which agent speaks next.
Setting Up a Group Chat
import autogen llm_config = {"config_list": [{"model": "gpt-4o", "api_key": "..."}]} # Specialized agents with distinct system messages planner = autogen.AssistantAgent( name="Planner", llm_config=llm_config, system_message="You break complex tasks into subtasks and assign them." ) coder = autogen.AssistantAgent( name="Coder", llm_config=llm_config, system_message="You write high-quality Python code. No prose, just code." ) critic = autogen.AssistantAgent( name="Critic", llm_config=llm_config, system_message="Review code for bugs, edge cases, and style. Be concise." ) user_proxy = autogen.UserProxyAgent( name="User", human_input_mode="NEVER", code_execution_config={"work_dir": "coding"} ) # Assemble the group chat groupchat = autogen.GroupChat( agents=[user_proxy, planner, coder, critic], messages=[], max_round=12, speaker_selection_method="auto" # LLM picks next speaker ) # The manager orchestrates the conversation manager = autogen.GroupChatManager( groupchat=groupchat, llm_config=llm_config ) user_proxy.initiate_chat( manager, message="Build a FastAPI endpoint that analyzes sentiment in text" )
Speaker Selection Methods
| Method | How It Works | Best For |
|---|---|---|
auto | GroupChatManager (LLM) picks the most relevant agent | General purpose, flexible tasks |
round_robin | Agents take turns in order | Structured review loops |
random | Random selection each turn | Exploration, diversity of views |
manual | Human picks each time | Interactive debugging |
| custom fn | Your function selects the speaker | Complex routing logic |
Custom Speaker Selection
def custom_speaker_selector(last_speaker, groupchat): # Always follow coder with critic if last_speaker.name == "Coder": return next(a for a in groupchat.agents if a.name == "Critic") # Always follow critic with user_proxy (to run fixed code) if last_speaker.name == "Critic": return next(a for a in groupchat.agents if a.name == "User") return "auto" # let LLM decide otherwise groupchat = autogen.GroupChat( ..., speaker_selection_method=custom_speaker_selector )
max_round limits, concise system messages, and consider GroupChat(messages=[], send_introductions=False) to keep context lean.Memory & RAG
Give agents long-term memory with vector stores and retrieval-augmented generation.
By default, AutoGen agents have no memory between conversations. Every initiate_chat call starts fresh. For production systems, you need persistent memory β and AutoGen provides RetrieveUserProxyAgent and hooks for external vector stores.
Built-in RAG: RetrieveUserProxyAgent
from autogen.agentchat.contrib.retrieve_user_proxy_agent import RetrieveUserProxyAgent rag_agent = RetrieveUserProxyAgent( name="rag_agent", retrieve_config={ "task": "qa", "docs_path": ["./my_docs/", "https://example.com/api-docs"], "chunk_token_size": 2000, "model": "gpt-4o", "vector_db": "chroma", # or "pgvector", "qdrant" "collection_name": "my_docs", "get_or_create": True, # reuse existing collection }, code_execution_config=False, human_input_mode="NEVER" ) rag_agent.initiate_chat( assistant, problem="What does our API return when authentication fails?" )
Memory Architecture Patterns
Custom Memory via Tools
import chromadb chroma_client = chromadb.Client() memory_collection = chroma_client.get_or_create_collection("agent_memory") # Give agents tools to read/write memory @user_proxy.register_for_execution() @assistant.register_for_llm(description="Save a fact to long-term memory") def save_memory(key: str, value: str) -> str: memory_collection.add( documents=[value], ids=[key] ) return f"Saved: {key}" @user_proxy.register_for_execution() @assistant.register_for_llm(description="Search memory for relevant facts") def search_memory(query: str) -> str: results = memory_collection.query(query_texts=[query], n_results=3) return str(results["documents"])
AutoGen vs. Other Frameworks
When to use AutoGen, when to use something else, and how to combine them.
Decision Framework
| Scenario | Best Pick | Why |
|---|---|---|
| AI writes & debugs code autonomously | AutoGen | Code execution loop + multi-agent review is AutoGen's sweet spot |
| Research: gather, analyze, synthesize | AutoGen | Autonomous reasoning + tool use + multi-agent collaboration |
| Strict step-by-step business workflow | LangGraph / n8n | Deterministic control flow with explicit state management |
| Role-based teams (PM, dev, QA) | CrewAI | First-class role/goal/task primitives |
| SaaS integration automation | n8n | 500+ no-code connectors, trigger-based workflows |
| Complex RAG over many data sources | LlamaIndex + AutoGen | LlamaIndex handles retrieval; AutoGen handles agentic reasoning |
Complementary Architectures
The real power comes from combining these frameworks, not choosing one.
β LlamaIndex retrieves relevant docs
β AutoGen multi-agent reasons + writes response
β n8n sends reply via Gmail connector
β SQL DB logs outcome
AutoGen Strengths & Weaknesses
β’ Flexible conversation patterns
β’ Strong Microsoft ecosystem integration
β’ Human-in-the-loop at any granularity
β’ Active research + rapid updates
β’ Token costs can escalate in group chats
β’ Debugging multi-agent loops is hard
β’ v0.4 API still maturing
β’ Less enterprise-grade tooling than LangGraph